The Statistics Of Super-Emitters: Modeling Heavy-Tailed Datasets As Power Laws

نویسنده

  • Marc Mansfield
چکیده

Many observational datasets of emissions, including emissions from the oil and gas sector, follow heavy-tailed distributions, for which a small fraction of the measured emitters are much larger than more typical emitters, and account for a large fraction of the total measured emission. Such distributions are problematic, because they are expected to exhibit large, negatively biased sampling errors. As a result, there are very real concerns that current bottom-up emissions inventories may underestimate the true emission. Based on the Generalized Central Limit Theorem, there are good reasons for expecting such datasets to obey power-law distribution functions, and I have been able to fit a number of datasets of methane in the environment with such distribution functions. Analyses of three datasets, comprising methane emissions from abandoned oil and gas wells in Pennsylvania, methane in soil gas near coal bed methane wells in Utah, and methane dissolved in ground water in West Virginia, respectively, are presented here. I have also developed an error-analysis algorithm for such distributions. In calculations on artificial datasets, I have verified that the error-analysis algorithm works well. Unfortunately, it may require information about the underlying distribution that may not be available in real-world applications. Overcoming this drawback is a current focus of my research. INTRODUCTION: HEAVY-TAILED DATASETS PROVIDE SPECIAL CHALLENGES FOR ESTIMATING EMISSIONS FROM THE OIL AND GAS SECTOR Figures 1 – 3 represent three datasets of methane measurements in the environment. Figure 1 shows emissions measured from abandoned oil and gas wells in Pennsylvania [Kang, et al., 2014]. Each vertical bar represents an individual measurement, of which there are a total of 38, arranged in order from smallest to largest. The measurements extend over several orders of magnitude, so the y-axis is logarithmic. Some wells, represented at the left end of the chart, showed very little methane leakage, considerably less than 1 mg/hr. However, other wells were detected to have leakage rates approaching 10 mg/hr. The four highest wells, or about 10% of the total, are responsible for about 95% of the total leakage. The median emission is 29 mg/hr, while the mean is over 6000 mg/hr. The huge difference between mean and median occurs because a few wells at the high end of the dataset dominate the total. Figure 2 shows methane concentrations in the soil gas near coal-bed methane wells in Utah [Stolp, et al., 2006]. These measurements were taken by inserting probes into the soil within 1 to 2 m of the wellhead, and to a depth of about 1 to 2 m. Again, each vertical bar represents an individual measurement. The bars in pink represent measurements assigned to the “tail” of the dataset and have been isolated for further analysis. The bars in light blue represent measurements not included in the tail. Measurements in the tail extend from 6 ppm to over 60,000 ppm. (The broad swath of measurements at 5 ppm were actually reported as “< 10 ppm,” the detection limit of many of the earlier measurements. Since they have been excluded from the tail they have no impact on the subsequent analysis.) Again, the mean is much larger than the median. Figure 3 shows concentrations of methane in the ground water in West Virginia, as measured in existing water wells [White and Mathes, 2006]. Again, the pink bars represent the “tail” of the measurement set. The “non-tail” measurements are again represented in light blue, and for this dataset, were all reported as 0. Since 0 cannot be represented on a log-plot, those particular data points have been assigned to their own axis, outside of the bar chart.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time Series Modeling of Coronavirus (COVID-19) Spread in Iran

Various types of Coronaviruses are enveloped RNA viruses from the Corona-viridae family and part of the Coronavirinae subfamily. This family of viruses affects neurological, gastrointestinal, hepatic, and respiratory systems. Recently, a new memb-er of this family, named Covid-19, is moving around the world. The expansion of Covid-19 carries many risks, and its control requires strict planning ...

متن کامل

Heavy Tails, Generalized Coding, and Optimal Web Layout

This paper considers Web layout design in the spirit of source coding for data compression and rate distortion theory, with the aim of minimizing the average size of files downloaded during Web browsing sessions. The novel aspect here is that the object of design is layout rather than codeword selection, and is subject to navigability constraints. This produces statistics for file transfers tha...

متن کامل

Bayesian analysis of robust Poisson geometric process model using heavy-tailed distributions

We propose a robust Poisson geometric process model with heavy-tailed distributions to cope with the problem of outliers as it may lead to an overestimation of mean and variance resulting in inaccurate interpretations of the situations. Two heavy-tailed distributions namely Student’s t and exponential power distributions with different tailednesses and kurtoses are used and they are represented...

متن کامل

On Bivariate Generalized Exponential-Power Series Class of Distributions

In this paper, we introduce a new class of bivariate distributions by compounding the bivariate generalized exponential and power-series distributions. This&nbsp;new class contains the bivariate generalized exponential-Poisson, bivariate generalized exponential-logarithmic, bivariate generalized exponential-binomial and bivariate generalized exponential-negative binomial distributions as specia...

متن کامل

Central limit theorems for linear statistics of heavy tailed random matrices

We show central limit theorems (CLT) for the linear statistics of symmetric matrices with independent heavy tailed entries, including entries in the domain of attraction of α-stable laws and entries with moments exploding with the dimension, as in the adjacency matrices of Erdös-Rényi graphs. For the second model, we also prove a central limit theorem of the moments of its empirical eigenvalues...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015